Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip serializing cardinality estimates in FeatureDistributions #447

Merged
merged 9 commits into from
Dec 18, 2019

Conversation

winterslu
Copy link
Contributor

Related issues
Tokens of text field are printing out while internal usage.

Describe the proposed solution
Ignore cardinality field in serialization

Additional context
Previous PR #420 & #438

Copy link
Collaborator

@TuanNguyen27 TuanNguyen27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, discussed internally.

@winterslu winterslu removed their assignment Dec 11, 2019
@codecov
Copy link

codecov bot commented Dec 11, 2019

Codecov Report

Merging #447 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #447      +/-   ##
==========================================
+ Coverage   86.93%   86.93%   +<.01%     
==========================================
  Files         337      337              
  Lines       11096    11102       +6     
  Branches      362      364       +2     
==========================================
+ Hits         9646     9652       +6     
  Misses       1450     1450
Impacted Files Coverage Δ
...om/salesforce/op/filters/FeatureDistribution.scala 98.66% <100%> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c6110b...d6dd9cb. Read the comment docs.

Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Copy link
Contributor

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tovbinm tovbinm changed the title Klu/ignore serializing cardinality Ignore serializing cardinality Dec 14, 2019
FeatureDistributionType.Scoring)
FeatureDistribution.toJson(Seq(fd1)) shouldNot include ("cardEstimate")

val json =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already computed this string on Line 249, you can store it in a variable.

Copy link
Contributor Author

@winterslu winterslu Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that json doesn't contains cardEstimate obj, which ignored while serializing, i construct another example with it, which should be tested that fromJson can still deserialize this obj without ignoring it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can generate it too:

val jsonWithCardEstimate = Serialization.write(fd1)(DefaultFormats +
      EnumEntrySerializer.json4s[FeatureDistributionType](FeatureDistributionType))

|"summaryInfo":[],"moments":{"m0":1,"m1":1.0,"m2":0.0,"m3":0.0,"m4":0.0},
|"cardEstimate":{"valueCounts":{"foo":1,"bar":2}},"type":"Scoring"}]
|""".stripMargin
FeatureDistribution.fromJson(json) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more concisely:

FeatureDistribution.fromJson(json) shouldBe Success(Seq(fd1))

Copy link
Contributor

@gerashegalov gerashegalov Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the test case for being able to deserialize where cardEstimate is not present in JSON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this test case is doing so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant the complementary case then without cardEstimate in JSON

@gerashegalov gerashegalov changed the title Ignore serializing cardinality Skip serializing cardinality estimates in FeatureDistributions Dec 18, 2019
@gerashegalov gerashegalov merged commit 3d51d5d into master Dec 18, 2019
@gerashegalov gerashegalov deleted the klu/ignore-serializing-cardinality branch December 18, 2019 18:48
@nicodv nicodv mentioned this pull request Jun 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants